From the GDM website: ‘The R package gdm implements Generalized Dissimilarity Modeling(Ferrier et al. 2007) to analyze and map spatial patterns of biodiversity. GDM models biological variation as a function of environment and geography using distance matrices – specifically by relating biological dissimilarity between sites to how much sites differ in their environmental conditions (environmental distance) and how isolated they are from one another (geographical distance). […] GDM also can be used to model other biological levels of organization, notably genetic (Fitzpatrick and Keller 2015) [..] and the approaches for doing so are largely identical to the species-level case with the exception of using a different biological dissimilarity metric depending on the type of response variable.’
2 Formatting data
2.1 Genomic data
2.1.1 Pairwise \(F_{ST}\) matrices
The GDM analysis takes as inputs matrices of pairwise \(F_{ST}\).
To estimate the pairwise \(F_{ST}\), we use the individual-level (i.e. allele counts for each genotype) genomic data with missing data (i.e. no imputation of the missing data), and without minor allele frequencies.
We use the Weir and Cockerham method, with the function pairwise.WCfst of the hierfstat package. We could have used the function gene.dist (with the option WC84), as in Gougherty, Keller, and Fitzpatrick (2021) (and the preprint Gougherty et al. 2020). gene.dist keeps only the lower triangle of the \(F_{ST}\) matrix, while pairwise.WCfst keeps the whole matrix.
Code
# we load the allele counts of each genotypegeno <-read.csv(here("data/DryadRepo/FormattedFilteredGenomicData_AlleleCounts_withoutmaf.csv"), row.names =1)geno[geno ==1] <-12geno[geno ==2] <-22geno[geno ==0] <-11geno <- geno %>%t() %>%as.data.frame() %>% dplyr::mutate(pop=substr(row.names(.), 0, 3)) %>% dplyr::select(pop, everything())geno[1:10,1:10] %>%kable_mydf(boldfirstcolumn = T)
The pairwise \(F_{ST}\) matrices contains some negative values:
28 in the matrix of the reference SNPs.
146 in the matrix of the merged candidates.
86 in the matrix of the common candidates.
It generally means there is more variation within than among populations and is likely to result from uneven sample sizes.
2.1.3 Scaling the \(F_{ST}\) matrices
We have to scale the \(F_{ST}\) matrices between 0 and 1 to facilitate model convergence and to enable comparisons the different sets of SNPs, which display different ranges of observed \(F_{ST}\) values.
We standardize the \(F_{ST}\) values in the following way:
fst_matrices <-sapply(fst_matrices, function(x){ (x-min(x,na.rm=T))/(max(x,na.rm=T)-min(x,na.rm=T))}, USE.NAMES =TRUE,simplify=FALSE)# To check that it worked# sapply(fst_matrices, function(x){range(x,na.rm=T)}, USE.NAMES = TRUE,simplify=FALSE)
2.1.4 Matrix visualization
Code
source(here("scripts/functions/corpmat.R"))col <-colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))viz_matrix <-function(x,plot_title) { p.mat <-corpmat(x) p <- corrplot::corrplot(x, method="color", col=col(200),type="upper", order="hclust", addCoef.col ="black", # Add coefficient of correlationtl.col="black", tl.srt=23, #Text label color and rotation# Combine with significancep.mat = p.mat, sig.level =0.05, insig ="blank", number.cex =0.8,tl.cex =0.8,# hide correlation coefficient on the principal diagonaldiag=FALSE,title = plot_title,mar=c(0,0,2,0)) # add an upper margin to see the titlereturn(p)}
We do not have to scale the climatic data before the GDM analysis, as scaling of predictors is part of model fitting in GDM, as said in Mokany et al. (2022): ‘As different predictors are measured on different scales (e.g., temperature in degrees, precipitation in mm), they are transformed as part of model fitting, such that the transformed distance between a pair of sites for different predictors can be meaningfully compared and combined.’
Precipitation seasonality (coefficient of variation)
index
SHM
Summer heat moisture index
°C / mm
Code
# Loading point estimate climatic dataadj <-"noADJ"# not adjusted for elevationref_period <-"ref_1901_1950"# reference period 1901-1950clim_past <-readRDS(here(paste0("data/ClimaticData/MaritimePinePops/ClimatePopulationLocationPointEstimates_ReferencePeriods_",adj,".rds")))[[ref_period]]$ref_means %>% dplyr::select(pop,longitude,latitude,any_of(clim_var))# Loading future climatic data extracted from the rasters for each GCMlist_clim_fut <-readRDS(here("data/ClimaticData/MaritimePinePops/ClimatePopulationLocationValuesExtractedFromRasters_FiveGCMs_2041-2070_SSP370.rds")) %>%lapply(function(clim_fut){clim_fut %>% dplyr::select(pop,gcm,any_of(clim_var))})
Warning! (from the GDM website) Note that if your site coordinates are longitude-latitude, the calculation of geographic distances between sites will have errors, the size of which will depend on the geographic extent and location of your study region. We hope to deal with this in a later release, but for now you can avoid these problems by using a projected coordinate system (e.g., equidistant).
Therefore, we reproject the geographic (spheroid) CRS of the population coordinates (with units in degrees longitude and latitude) to a projected (two-dimensional; cartesian) CRS (typically with units of meters from a datum). We will use the CRS EPSG:3035.
Code
# Steps:# extract the geographic coordinates of the populations# transform the coordinates in spatial points and specify the CRS (WGS84) with sp package# transform to a SpatVector object for the terra package# reproject with terra package in CRS EPSG:3035# extract population coordinates from the SpatVector with terra packagepop_coord_proj <- clim_past %>% dplyr::select(contains("ude")) %>% sp::SpatialPoints(proj4string =CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")) %>% terra::vect() %>% terra::project("EPSG:3035") %>% terra::crds() %>%as_tibble() %>% dplyr::rename(long_EPSG_3035=x, lat_EPSG_3035=y)# merge the population coordinates with the climatic variablesclim_past <-bind_cols(clim_past,pop_coord_proj) %>% dplyr::select(-contains("ude"))list_clim_fut <-lapply(list_clim_fut, function(clim_fut) bind_cols(clim_fut,pop_coord_proj) %>% dplyr::select(-contains("ude")))
3 GDM fitting
Combining genomic and climatic data with formatsitepair
We use the formatsitepair function to combine the genomic and climatic data into population-pair table format. In our case, we use a pairwise biological dissimilarity matrix (i.e. pairwise \(F_{ST}\)) as response variable, and therefore we have to set bioFormat=3 in the formatsitepair function.
Warning! The rows and columns of the distance matrix have to be in the same order as the rows of the climatic variables. And the distance matrix and the climatic data must not include NAs.
GDM fitting
From the GDM website: ’GDM is a nonlinear extension of permutational matrix regression that uses flexible splines and generalized linear modeling (GLM) to accommodate two types of nonlinearity common in ecological datasets: (1) variation in the rate of compositional turnover (non-stationarity) along environmental gradients, and (2) the curvilinear relationship between biological distance and environmental and geographical distance.
Generalized dissimilarity models are fitted with the gdm function.
Different options are available:
geo=T to specify that the model should be fit with geographical distance.
splines an optional vector specifying the the number of I-spline basis functions (the default is three, with larger values producing more complex splines).
knots an optional vector specifying the locations of “knots” along the splines (defaults to 0 (minimum), 50 (median), and 100 (maximum) quantiles when three I-spline basis functions are used). E
From the GDM website: ‘Even though these option are available, using the default values for these parameters will work fine for most applications. In other words, unless you have a good reason, you should probably use the default settings for splines and knots. The effects (and significance) of altering the number of splines and knot locations has not been systematically explored.’
plots <-lapply(snp_sets, function(x){tibble(obs=x$gdm_mod$observed, # can also be extracted with x$gdm_tab$distancepred=x$gdm_mod$predicted) %>%ggplot(aes(x=obs,y=pred)) +geom_abline(intercept =0, slope =1, color="gray60") +geom_point(color=rgb(0,0,1,0.5)) +xlim(c(0,1)) +ylim(c(0,1)) +xlab("Observed genetic distance") +ylab("Predicted genetic distance") +ggtitle(x$set_name) +theme_bw()})plot_grid(plotlist=plots, nrow=1)
I-splines
From the GDM website: ‘The fitted I-splines provide an indication of how population genetic composition changes along each climatic gradient. They are one of the most informative components of a fitted GDM and so plotting and scrutinizing the splines is a major part of interpreting GDM and the analyzed biological patterns.’
I-splines are shown for predictors with non-zero coefficients, i.e. predictors with a relationship with the genetic distance.
I-spline interpretation: (from the GDM website) ’The maximum height of each spline indicates the magnitude of total genetic change along that gradient and thereby corresponds to the relative importance of that predictor in contributing to allelic turnover while holding all other variables constant (i.e., is a partial climatic distance). The spline’s shape indicates how the rate of genetic change varies with position along that gradient. Thus, the splines provide insight into the total magnitude of genetic change as a function of each gradient and where along each gradient those changes are most pronounced.
Warning! Note that the formatsitepair function assumes that the coordinates of the sites are in the same coordinate system as the rasters. At present, no checking is performed to ensure this is the case..
6 GDM predictions
From the GDM website: A GDM model can be used to (i) predict the genetic dissimilarity between population pairs in space or between times using the predict function and (ii) transform the predictor variables from their arbitrary climatic scales to a common biological importance scale using the gdm.transform function.
6.1 Genomic offset of the populations
We want to predict the expected degree of maladaptation in units of the response variable (\(F_{ST}\)) for each population.
6.1.1 Using spatial points
For that, we create a dataset with:
the genetic distance fixed to O. Indeed, we want the disruption of the current gene-climate relationships under future climates, so we assume that the genetic composition is the same between current and future climate to do this calculation.
weights are fixed to 1.
Code
# dataframe with past climatic datapop_tab_pred_clim_past <- clim_past %>% dplyr::select(-pop) %>% dplyr::rename(xCoord=long_EPSG_3035,yCoord=lat_EPSG_3035) %>%set_colnames(paste0("s1.",colnames(.)))# list of dataframe with future climatic data for each GCMlist_pop_tab_pred <- list_clim_fut %>%lapply(function(clim_fut){clim_fut <- clim_fut %>% dplyr::select(-pop,-gcm) %>% dplyr::rename(xCoord=long_EPSG_3035,yCoord=lat_EPSG_3035) %>%set_colnames(paste0("s2.",colnames(.)))bind_cols(pop_tab_pred_clim_past, clim_fut) %>%mutate(distance=0,weights=0) %>% dplyr::select(distance,weights,contains("Coord"), everything())})# calculate the genomic offset for each GCM and each snp setsnp_sets <-lapply(snp_sets, function(x){x$go <-lapply(list_pop_tab_pred, function(pop_tab_pred){predict(x$gdm_mod,pop_tab_pred)})return(x)})
Code
# Function to plot the relationship between the Euclidean climatic distance and the GDM genetic offsetsource(here("scripts/functions/make_eucli_plot.R"))# Calculate the Euclidean climatic distancelist_dist_env <- list_clim_fut %>%lapply(function(clim_fut){Delta = clim_past %>% dplyr::select(-pop,-contains("EPSG")) - clim_fut %>% dplyr::select(-pop,-gcm,-contains("EPSG")) dist_env =sqrt( rowSums(Delta^2) )})# Main gene pools (for the figures)gps <-readRDS(here("data/GenomicData/MainGenePoolPopulations.rds")) %>%arrange(pop)
# Function to make the genomic offset mapssource(here("scripts/functions/make_go_map.R"))# Population coordinatespop_coord <-readRDS(here(paste0("data/ClimaticData/MaritimePinePops/ClimatePopulationLocationPointEstimates_ReferencePeriods_",adj,".rds")))[[ref_period]]$ref_means %>% dplyr::select(pop,longitude,latitude)# Find minimum and maximum values of genomic offset for the mapsgo_limits <-lapply(snp_sets, function(x) {lapply(names(list_dist_env), function(gcm){x$go[[gcm]]}) %>%unlist()}) %>%unlist() %>%range()# Generate the maps for each set of SNPs and each GCMlapply(snp_sets, function(x) {go_maps <-lapply(names(list_clim_fut), function(gcm){make_go_map(dfcoord=pop_coord,snp_set = x,gcm=gcm,ggtitle=gcm,go_limits = go_limits,point_size =3)})legend_maps <-get_legend(go_maps[[1]])go_maps <-lapply(go_maps, function(y) y +theme(legend.position ="none"))go_maps$legend_maps <- legend_mapsgo_maps <-plot_grid(plotlist=go_maps)title <-ggdraw() +draw_label( x$set_name,fontface ='bold',x =0,hjust =0 ) +theme(plot.margin =margin(0, 0, 0, 7))# merge title and plotsplot_grid( title, go_maps,ncol =1,rel_heights =c(0.1, 1)) })
6.1.2 Using rasters
Code
# Past rasterspath <-here("data/ClimaticData/ClimateDTRasters/1km_1901-1950_Extent-JulietteA/")tif_paths <-lapply(clim_var, function(x) paste0(path,"/",x,".tif"))past_rasts <- raster::stack(tif_paths) %>%projectRaster(crs ="EPSG:3035")list_pred_rasts <-lapply(snp_sets, function(x){ # for each snp setlapply(names(list_clim_fut), function(gcm){ # for each GCM# Future rasters path <-here(paste0("data/ClimaticData/ClimateDTRasters/1km_",gcm,"_2041-2070_ssp370_Extent-JulietteA/"))tif_paths <-lapply(clim_var, function(x) paste0(path,"/",x,".tif"))fut_rasts <- raster::stack(tif_paths)fut_rasts <-projectRaster(fut_rasts, crs ="EPSG:3035")predict(x$gdm_mod, past_rasts, # raster with current climate at 2.5 minutes resolutiontime=TRUE, fut_rasts) # Rasters with future climates at 2.5 minutes resolution}) %>%setNames(names(list_clim_fut))})list_pred_rasts %>%saveRDS(here("outputs/GDM/go_pred_rasters.rds"))
6.1.3 Corr btw predictions with rasters or spatial points
We check that the genomic offset predictions we obtained with the spatial points are highly correlated (also similar) to those obtained with the rasters.
# Load the climatic data of the NFI plots.nfi_clim <-readRDS(here("data/ClimaticData/NFIplots/NFIclimate.rds")) # Reproject the plotcode coordinates in the projection system `EPSG:3035`.plotcoord_EPSG_3035 <- nfi_clim[[1]] %>% dplyr::select(contains("ude")) %>% sp::SpatialPoints(proj4string =CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")) %>% terra::vect() %>% terra::project("EPSG:3035") %>% terra::crds() %>%as_tibble() %>% dplyr::rename(xCoord=x, yCoord=y)# Keep only the climatic variables of interest and the plotcode coordinatesnfi_clim <- nfi_clim %>%lapply(function(x){x %>%bind_cols(plotcoord_EPSG_3035) %>% dplyr::select(contains("Coord"),any_of(clim_var))})# Format as GDM inputsnfi_clim[[1]] <- nfi_clim[[1]] %>%set_colnames(paste0("s1.",colnames(.)))nfi_clim[[2]] <- nfi_clim[[2]] %>%set_colnames(paste0("s2.",colnames(.)))nfi_clim <-bind_cols(nfi_clim) %>%mutate(distance=0,weights=0) %>% dplyr::select(distance,weights,contains("Coord"), everything())
source(here("scripts/functions/make_go_map.R"))# Find minimum and maximum values of genomic offset for the mapsgo_limits <-lapply(snp_sets, function(snp_set) snp_set$go_nfi) %>%unlist() %>%range()# The minimum GO value is very very small, almost zero, so we fix it to zero.go_limits[[1]] <-0lapply(snp_sets, function(x) make_go_map(dfcoord=readRDS(here("data/ClimaticData/NFIplots/NFIclimate.rds"))[[1]] %>% dplyr::select(contains("ude")), snp_set = x,type="NFI",point_size =0.5,go_limits = go_limits,legend_position =c(0.85,0.15),legend_box_background ="gray80",y_limits =c(35, 51)))
8 Validation - Common gardens
Climatic data and coordinates
Code
# format climatic datacg_clim <-readRDS(here("data/ClimaticData/CommonGardens/ClimateCG.rds")) %>% dplyr::select(any_of(clim_var)) %>%set_colnames(paste0("s2.",colnames(.))) %>%replicate(nrow(pop_tab_pred_clim_past),.,simplify=F) %>%bind_rows() %>%bind_cols(slice(pop_tab_pred_clim_past, rep(1:n(), each =5))) %>%mutate(s2.xCoord=s1.xCoord, # same coordinates, so that only climatic differences are considereds2.yCoord=s1.yCoord,distance=0,weights=0) %>% dplyr::select(distance, weights, s1.xCoord, s1.yCoord, s2.xCoord, s2.yCoord,contains("s1"), contains("s2"))
Ferrier, Simon, Glenn Manion, Jane Elith, and Karen Richardson. 2007. “Using Generalized Dissimilarity Modelling to Analyse and Predict Patterns of Beta Diversity in Regional Biodiversity Assessment.”Diversity and Distributions 13 (3): 252–64.
Fitzpatrick, Matthew C, and Stephen R Keller. 2015. “Ecological Genomics Meets Community-Level Modelling of Biodiversity: Mapping the Genomic Landscape of Current and Future Environmental Adaptation.”Ecology Letters 18 (1): 1–16.
Gougherty, Andrew V, Stephen R Keller, and Matthew C Fitzpatrick. 2021. “Maladaptation, Migration and Extirpation Fuel Climate Change Risk in a Forest Tree Species.”Nature Climate Change 11 (2): 166–71.
Mokany, Karel, Chris Ware, Skipton NC Woolley, Simon Ferrier, and Matthew C Fitzpatrick. 2022. “A Working Guide to Harnessing Generalized Dissimilarity Modelling for Biodiversity Analysis and Conservation Assessment.”Global Ecology and Biogeography 31 (4): 802–21.